10 research outputs found

    Specialising Parsers for Queries

    Get PDF
    Many software systems consist of data processing components that analyse large datasets to gather information and learn from these. Often, only part of the data is relevant for analysis. Data processing systems contain an initial preprocessing step that filters out the unwanted information. While efficient data analysis techniques and methodologies are accessible to non-expert programmers, data preprocessing seems to be forgotten, or worse, ignored. This despite real performance gains being possible by efficiently preprocessing data. Implementations of the data preprocessing step traditionally have to trade modularity for performance: to achieve the former, one separates the parsing of raw data and filtering it, and leads to slow programs because of the creation of intermediate objects during execution. The efficient version is a low-level implementation that interleaves parsing and querying. In this dissertation we demonstrate a principled and practical technique to convert the modular, maintainable program into its interleaved efficient counterpart. Key to achieving this objective is the removal, or deforestation, of intermediate objects in a program execution. We first show that by encoding data types using Böhm-Berarducci encodings (often referred to as Church encodings), and combining these with partial evaluation for function composition we achieve deforestation. This allows us to implement optimisations themselves as libraries, with minimal dependence on an underlying optimising compiler. Next we illustrate the applicability of this approach to parsing and preprocessing queries. The approach is general enough to cover top-down and bottom-up parsing techniques, and deforestation of pipelines of operations on lists and streams. We finally present a set of transformation rules that for a parser on a nested data format and a query on the structure, produces a parser specialised for the query. As a result we preserve the modularity of writing parsers and queries separately while also minimising resource usage. These transformation rules combine deforested implementations of both libraries to yield an efficient, interleaved result

    Fold-based fusion as a library: a generative programming pearl

    Get PDF
    Fusion is a program optimisation technique commonly implemented using special-purpose compiler support. In this paper, we present an alternative approach, implementing fold-based fusion as a standalone library. We use staging to compose operations on folds; the operations are partially evaluated away, yielding code that does not construct unnecessary intermediate data structures. The technique extends to partitioning and grouping of collections

    Accelerating parser combinators with macros

    Get PDF
    Parser combinators provide an elegant way of writing parsers: parser implementations closely follow the structure of the underlying grammar, while accommodating interleaved host language code for data processing. However, the host language features used for composition introduce substantial overhead, which leads to poor performance. In this paper, we present a technique to systematically eliminate this overhead. We use Scala macros to analyse the grammar specification at compile-time and remove composition, leaving behind an efficient top-down, recursive-descent parser. We compare our macro-based approach to a staging-based approach using the LMS framework, and provide an experience report in which we discuss the advantages and drawbacks of both methods. Our library outperforms Scala's standard parser combinators on a set of benchmarks by an order of magnitude, and is 2x faster than code generated by LMS

    What are the Odds? Probabilistic programming in Scala

    Get PDF
    Probabilistic programming is a powerful high-level paradigm for probabilistic modeling and inference. We present Odds, a small domain-specific language (DSL) for probabilistic programming, embedded in Scala. Odds provides first-class support for random variables and probabilistic choice, while reusing Scala's abstraction and modularity facilities for composing probabilistic computations and for executing deterministic program parts. Odds accurately represents possibly dependent random variables using a probability monad that models committed choice. This monadic representation of probabilistic models can be combined with a range of inference procedures. We present engines for exact inference, rejection sampling and importance sampling with look-ahead, but other types of solvers are conceivable as well. We evaluate Odds on several non-trivial probabilistic programs from the literature and we demonstrate how the basic probabilistic primitives can be used to build higher-level abstractions, such as rule-based logic programming facilities, using advanced Scala features

    ExPASy: SIB bioinformatics resource portal

    Get PDF
    ExPASy (http://www.expasy.org) has worldwide reputation as one of the main bioinformatics resources for proteomics. It has now evolved, becoming an extensible and integrative portal accessing many scientific resources, databases and software tools in different areas of life sciences. Scientists can henceforth access seamlessly a wide range of resources in many different domains, such as proteomics, genomics, phylogeny/evolution, systems biology, population genetics, transcriptomics, etc. The individual resources (databases, web-based and downloadable software tools) are hosted in a ‘decentralized' way by different groups of the SIB Swiss Institute of Bioinformatics and partner institutions. Specifically, a single web portal provides a common entry point to a wide range of resources developed and operated by different SIB groups and external institutions. The portal features a search function across ‘selected' resources. Additionally, the availability and usage of resources are monitored. The portal is aimed for both expert users and people who are not familiar with a specific domain in life sciences. The new web interface provides, in particular, visual guidance for newcomers to ExPAS

    Optimizing data structures in high-level programs: New directions for extensible compilers based on staging

    No full text
    High level data structures are a cornerstone of modern programming and at the same time stand in the way of compiler optimizations. In order to reason about user or library-defined data structures, compilers need to be extensible. Common mechanisms to extend compilers fall into two categories. Frontend macros, staging or partial evaluation systems can be used to programmatically remove abstraction and specialize programs before they enter the compiler. Alternatively, some compilers allow extending the internal workings by adding new transformation passes at different points in the compile chain or adding new intermediate representation (IR) types. None of these mechanisms alone is sufficient to handle the challenges posed by high level data structures. This paper shows a novel way to combine them to yield benefits that are greater than the sum of the parts

    Go Meta! A Case for Generative Programming and DSLs in Performance Critical Systems

    Get PDF
    Most performance critical software is developed using very low-level techniques. We argue that this needs to change, and that generative programming is an effective avenue to enable the use of high-level languages and programming techniques in many such circumstances.ISSN:1868-896
    corecore